-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Improve multimodal processors - rely less on kwargs #28711
base: main
Are you sure you want to change the base?
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice - looking good!
Let me know when you want another review 🤗
@@ -39,7 +39,7 @@ def __init__( | |||
do_rescale: bool = True, | |||
rescale_factor: Union[int, float] = 1 / 255, | |||
do_normalize: bool = True, | |||
do_center_crop: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for changing the default here? I think BridgeTowerImageProcessor defaults to this being True
, so would have this value, even if not passed here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, was conflicted on this... this is the current version of preprocess
for bridgetower in main, missing a new variable declaration:
transformers/src/transformers/models/bridgetower/image_processing_bridgetower.py
Line 448 in 115ac94
do_center_crop if do_center_crop is not None else self.do_center_crop |
In that case, the
do_center_crop
that was used in preprocess
was the preprocess
default, i.e. None
instead of whatever was in the __init__
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amyeroberts on that, bridgetower did default do_center_crop
to being True, but preprocess
was not capturing it (it was a bug). The get_expected_values()
method does not include mentions on center_crop
and will crate expected uncropped values. From that
- Either change
...expected_values...
getter to something that includes cropping in logic - change default of tester to match previous behaviour
if len(args) > 0: | ||
images = args[0] | ||
args = args[1:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's very nice to see this go :)
) | ||
# add pixel_values + pixel_mask | ||
print(size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print(size) |
valid_processor_keys = { | ||
"images", | ||
"do_resize", | ||
"size", | ||
"size_divisor", | ||
"resample", | ||
"do_rescale", | ||
"rescale_factor", | ||
"do_normalize", | ||
"image_mean", | ||
"image_std", | ||
"do_pad", | ||
"do_center_crop", | ||
"return_tensors", | ||
"data_format", | ||
"input_data_format", | ||
"pad_and_return_pixel_mask", | ||
} | ||
|
||
unused_keys = set(kwargs.keys()) - valid_processor_keys | ||
if unused_keys: | ||
unused_key_str = ", ".join(unused_keys) | ||
logger.info(f"Unused or unrecognized configuration parameters: {unused_key_str}.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two comments:
- What's the reason for this being added for some image processors but not others e.g. donut also accepts kwargs?
- Could we abstract out this check to something similar with Abstract image processor arg checks. #28843 using either inspect or an explicit class attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points!
- no reason at all - all for adding a similar logic to all!
inspect
is a bit slow, right? But yes, also +1 to abstracting this away, I'll move it to another PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amyeroberts was thinking about decorators: wdyt about having this as a wrapper/decorator in image_utils or elsewhere in utils?
valid_processor_keys = inspect.getfullargspec(self.preprocess)[0]
unused_keys = set(kwargs.keys()) - valid_processor_keys
if unused_keys:
unused_key_str = ", ".join(unused_keys)
logger.info(f"Unused or unrecognized configuration parameters: {unused_key_str}.")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Perhaps instead of inspecting, we could have a class attribute with the valid_processor_keys? e.g. something like this:
class FooImageProcessor:
_valid_processor_keys = ['bar', 'baz']
...
def preprocess(..., kwargs):
validate_kwargs(self._valid_processor_keys, kwargs)
Having a class attribute like this is something I think I'm going to end up with in the #28847 design
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good yes! self-contained classes seem worth losing the decorator
**kwargs, | ||
) -> None: | ||
if "pad_and_return_pixel_mask" in kwargs: | ||
do_pad = kwargs.pop("pad_and_return_pixel_mask") | ||
if pad_and_return_pixel_mask: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's set the default value for size
in the init to avoid having mutables as default arguments
if pad_and_return_pixel_mask: | |
size = {"shortest_edge": 288} if size is None else size | |
if pad_and_return_pixel_mask: |
|
||
if text is not None: | ||
self.current_processor = self.tokenizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current processor behaviour is deprecated, so we don't need to set it here. In fact, we should probably create a current_processor
property which shows a deprecation message when used
self.current_processor = self.tokenizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that's good to know! Seen it in another instance I think, I'll drop it in that case and add the message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's definitely still around in places. For context, we used to have a behaviour when the current_processor was selected through a context manager. The context manager behaviour was removed, but there's still remnants of this, even though current_processor
normally has no effect.
|
||
if text is not None: | ||
self.current_processor = self.tokenizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here about current processors
do_center_crop: Optional[bool] = True, | ||
do_pad: Optional[bool] = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These shouldn't be optional in the init
do_center_crop: Optional[bool] = True, | |
do_pad: Optional[bool] = True, | |
do_center_crop: bool = True, | |
do_pad: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
absolutely, fixed in #28843 which should be merged before that one
What does this PR do?
This PR aims at a better control on the logic flow through
Processor
classes, in particular those leveragingImageProcessor
with aTokenizer
. Linked with #27768.ImageProcessors
compared toNougat
(as a reference point) have different signatures in their preprocess. One can list them hereThis helps standardize a bit in the first place, and then, will allow uniformizing
Processors
.Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Models: